Steps followed

  1. Exploratory Data Analysis
  2. Data Cleaning and Prepration
  3. Model building, evaluation, parameter tuning
  4. Make predictions

Read Data

EDA

check class imbalance

Do some basic sanity checks

Check null values

check for duplicate values

Drop columns with single value, no predicting power

Correlation plots

Data Preparation

check for distribution of columns with missing values

Modeling

Define functions for plots

Random Forest Classifier

GridSearch cross validation to find optimal hyperparameters

Support Vector Machines

Handle class imbalance

Balanced Bagging Classifier

Random Undersample boosting classifier

Validation Set Prediction

Check of null values in validation set

Fill in missing values in validation set with training data mean

Using random undersampler boosting classifier as it has highest AUC, and train and test area is almost same